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In the first phase of the EU DataGrid (EDG) project, a Data Management System has been implemented and 
provided for deployment. The components of the current EDG Testbed are: a prototype of a Replica Manager 
Service built around the basic services provided by Globus, a centralised Replica Catalogue to store information 
about physical locations of files, and the Grid Data Mirroring Package (GDMP) that is widely used in various 
HEP collaborations in Europe and the US for data mirroring. During this year these services have been refined 
and made more robust so that they are fit to be used in a pre-production environment. Application users have 
been using this first release of the Data Management Services for more than a year. In the paper we present the 
components and their interaction, our implementation and experience as well as the feedback received from our 
user communities. We have resolved not only issues regarding integration with other EDG service components 
but also many of the interoperability issues with components of our partner projects in Europe and the U.S. The 
paper concludes with the basic lessons learned during this operation. These conclusions provide the motivation 
for the architecture of the next generation of Data Management Services that will be deployed in EDG during 
2003. 



1. Introduction 

Data management is one of the key features of a 
Data Grid where large amounts of data are distributed 
and/or rephcated to remote sites, potentially all over 
the world. In general, a Data Grid needs to provide 
features of a pure computational Grid 7] (resource 
discovery, sharing etc.) as well as more specialised 
data management features like replica management 
which is the main focus of this article. 

The European DataGrid project Q (also referred 
to as EDG in this article), one of the largest Data 
Grid projects today, has a main focus on providing 
and deploying such data replication tools. Although 
the project officially started in January 2001, proto- 
type implementations started already in early 2000 
and a first data management architecture was pre- 
sented in Thus, within the project there is already 
a well-established experience in providing replication 
tools and deploying them on a large-scale testbed. 

Since interoperability of services and international 
collaborations on software development are of major 
importance for EDG as well as other Grid projects in 
Europe, the U.S. etc., the first set of data management 
tools (i.e. replication tools) provided and presented 
here, are based on established de-facto standards in 
the Grid community. In addition, for parts of the 
software presented here, EDG has development and 
deployment collaborations with partner projects like 
PPDG [11, DataTAG jl^ and LCG 

In this article, we present our first set of replication 
tools that have been deployed on the European Data- 
Grid testbed. These tools are included in release 1.4 
of the EDG software system and are thus regarded 



as the first prototype of the data management soft- 
ware system. Details about the architecture, software 
features and experience is given. 

The article is organised as follows. In Section [3 we 
first outline briefly the data management challenge 
and present the architecture that we established for 
EDG release 1.4. The rephcation tools GDMP (Grid 
Data Mirroring package) and edg-replica-manager are 
presented in the context of the data management ar- 
chitecture. Implementation details of these tools and 
a detailed discussion on their differences in design and 
usage are presented in Section Their deployment 
in several testbeds and some historical background 
about the deployment is given in Section 01 Since 
these replication tools are supposed to be replaced by 
second generation replication tools, we briefly intro- 
duce them in Section |S1 since the experience that has 
been gained deploying EDG release 1.4 provided vital 
input for this new development. 



2. Problem Domain, Requirements, 
Architecture 



In the following section we first describe the data 
management domain with its requirements and then 
a simplified architecture for our first generation repli- 
cation tools, i.e. the ones deployed in EDG release 1.4. 
More details on EDG releases is given in Section 0] 
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2.1. Basic Requirements 

In this section we outline the main design features 
that we have chosen to meet the requirements of par- 
ticular data intensive application domains. We first 
start with a basic example and then summarise the 
basic requirements that are tackled by our replication 
tools. 

In a typical Data Grid, large amounts of data that 
are stored in read-only files need to be replicated in 
a secure and efficient way 0. As a basic file repli- 
cation example we consider a Data Grid that consists 
of four sites (CERN in Switzerland, Fermilab in the 
U.S., Italy and France) as depicted in Figure ^ In 
the example, new files have been created at the site 
"Fermilab" and are now ready to be replicated to re- 
mote sites. Several remote sites (e.g. the site CERN) 
is interested in having files locally and thus would like 
to replicate the newly created files to its local data 
store. End-users can then access replicas at both sites 
and might want to retrieve files with the lowest access 
latency. 




Italy 







filel.dbf 


France ^ 






filel.dbf 







new files produced at Fermilab 
- ready for replication 



Figure 1: Basic replication example. 

Note that for simplicity we deal with read-only files 
and leave the update-synchronisation problem open 
for future work as pointed out in |^. 

In the following list the major data management 
and replication requirements are summarised and the 
need for a specific software solution is outlined. Most 
of these solutions are covered by our replication tools 
discussed in the article. 

• Files need to be transferred (copied) between 
several large data stores that reside at dis- 
tributed sites. - need for secure and efficient file 
transfer mechanism (e.g. GridFTP or equiva- 
lent) 

• Since replication implies that identical file copies 
exist, replicas need to be uniquely identified 
through logical and physical filenames. - need 
for Replica Catalogue for naming and locating 
replicas 



• Combine file transfer with file cataloguing and 
present it as an atomic transaction to the user. 
- need for replica management service 

• Large data stores use secondary and possibly 
tertiary storage devices, i.e. disk and tape sys- 
tems, respectively. - need for interaction between 
replica management service and storage service, 
i.e. mass storage interface 

More detailed requirements of Data Grids (in par- 
ticular data intensive scientific domains like High En- 
ergy Physics) and the data distribution problem are 
presented elsewhere 0, • 

2.2. Basic Terminology 

In the remainder of this article, we use the following 
EDG terminology: 

• Storage Element (SE): a Storage Element is a 
data store that provides secondary and/or ter- 
tiary storage devices as well as a data trans- 
fer mechanism that allows for file transfers be- 
tween several Storage Elements connected via 
wide-area network links 

• Computing Element ( CE): a computing Element 
can be regarded as a gateway to several Worker 
Nodes (WN) that are responsible for the execu- 
tion of a user job. It is important to note that 
data that has been produced on Worker Nodes 
needs to be stored on Storage Elements in order 
to be accessible for subsequent user jobs. 

• User Interface (UI): a User Interface node is a 
machine where application users can log on and 
have access to the EDG software tools. In prin- 
ciple, the UI contains client software tools. 

• Logical File Name (LFN): an LFN uniquely 
identifies a set of identical replicas. 

• Physical File Name (PEN): identifies one file 
(replica) of a set of identical replicas. Note that 
the terminology changes from time to time but 
it is important to note that the PEN identifies 
a real data file in a Storage Element or a Com- 
puting Element (Worker Node). 



2.3. Architecture of the Data 
Management Services 

The architecture of our replication tools is based 
on a typical topology of Storage Elements, Comput- 
ing Elements (i.e. Worker Nodes) and User Interface 
nodes (UI) as outlined below. This topology is also 
realised in the EDG testbed as illustrated in Figure[21 
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Figure 2: Basic topology of EDG testbed (simplified for 
replication tools). 

File transfer is mainly done between Worker Nodes 
(of Computing Elements) and Storage Elements where 
files are permanently kept and registered with a 
Replica Catalogue (RC). In order to introduce new 
files to the Grid, they can also be transferred from 
a UI node to a Worker Node or directly to the Stor- 
age Element. All these use cases have the following in 
common: a client-server architecture is required where 
client tools have to be available on the User Interface 
as well as on Worker Nodes. Server software is mainly 
required on the Storage Elements. 

In order to meet the requirements outlined in Sec- 
tion |0] we have developed replica management tools 
in two steps: 

• GDMP (Grid Data Mirroring Pack- 
age) |15l lla | for replication (mirroring) of file 
sets between several Storage Elements. This 
was the first replication tool that was developed 
in collaboration between the European Data- 
Grid project and the Particle Physics Data Grid 
(PPDG) H (refer to Section H for more details 
on release dates). This replication tool also pro- 
vides a simple interface to Mass Storage Systems 
(MSS). 



GDMP 




Grid Data Mirroring Package 

Figure 3: Grid Data Mirroring Package (GDMP) - logo. 

• edg-replica-manager was developed in the 
second year of the DataGrid project. It pro- 



vides some added replication functionality that 
meets additional user requires that were iden- 
tified during the deployment of GDMP in the 
EDG testbed. In this way, both tools comple- 
ment each other and provide the basic replica- 
tion functionality of the first generation replica- 
tion tools. 




Figure 4: edg-replica-manager - logo. 



Both, GDMP and edg-replica-manager are part of 
the current (as of March 2003) EDG software release 
1.4. 

Both, GDMP and edg-replica-manager use compo- 
nents of the Globus Toolkit 2 (TM) and thus are based 
on the current de-facto standard in Grid computing. 
Although GDMP and edg-replica-manager are archi- 
tecturally different (client-server architecture versus 
client side tool only - details are given in Section O 
- they have the following architectural components in 
common: 

• GridFTP 1] for efficient and secure file transfer. 

• Grid Security Infrastructure (GSI) for secure 
communication (for message passing as well as 
file transfer) 

• Replica Catalogue (both LDAP based RC as 
wen as RLS Q) 

Since GDMP has a much richer set of functionality, 
additional features of Globus are used (e.g. Globus 
10 for client-server communication). 



3. Implementation Details and 
Comparison 

After some general architectural introduction, we 
now go more into detail with the features of GDMP 
and edg-replica-manager and how the tools are used 
in the EDG testbed. For each of the two software 
tools we give advantages and disadvantages and thus 
a critical discussion. Finally, we compare the two tools 
directly and point out for which use case they can be 
used in a most efficient way. 
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3.1. Replica Catalogue Interaction 

Since both our main replication tools GDMP and 
edg-replica-manager use a replica catalogue to identify 
and locate file replicas, we first explain the replica 
catalogue interaction and its usage. 

In the Globus Toolkit, a simple, centralised replica 
catalogue is provided that is based on LDAP technol- 
ogy for storing and retrieving replica information P]. 
In EDG, we developed a wrapper around the Globus 
C API and provided a C-I-+ API as well as a simple 
command line tool. 

On the EDG testbed, this LDAP based Replica Cat- 
alogue has been used but it showed several limitations 
as outlined in the list below. 

• Performance deterioration with number of en- 
tries: Due to the way how the LDAP schema 
has been chosen for the replica catalogue, we 
experienced low response times (in the order of 
30 seconds to a few minutes) for inserts into the 
Replica Catalogue with the number of entries. 
If the filenames (LFNs) are short (in the order 
to 10 characters), this problem does not occur 
too often but with long filenames (in the order 
of 50 to 100 characters per LFN), there were se- 
vere limitations. This is also partly due to the 
overhead of the C-|— I- wrapper. 

• Centralised, non-scalable: The LDAP based 
replica catalogue is hosted by a single LDAP 
server and thus is a single point of failure. Based 
on the previous item, it was identified that the 
catalogue did not easily scale to large amounts 
of file entries. Thus, we needed to impose re- 
strictions on the users to limit the amount of 
inserts within a certain time window. 

• No high level user command line tool for brows- 
ing: There exist a few command line tools pro- 
vided by Globus and EDG to query the cata- 
logue but there are no high level tools for brows- 
ing. An alternative is a simple Graphical User 
Interface provided by EDG but not deployed on 
the EDG testbed. Another option is a simple 
LDAP browser. 

• Schema not flexible: the LDAP based schema 
that is organised in collections, locations and 
logical files etc. does not allow for a simple ex- 
tension. 

• LFN sub-set of PEN: there is a severe limitation 
on file naming since the LFN always needs to be 
a sub-set of the physical file name. Thus, all 
Storage Elements need to have a similar direc- 
tory structure of replicas. This can be a limita- 
tion since it imposes specific and global configu- 
rations of SEs (i.e. all SEs need to be configured 
in a similar way). 



The based Replica Catalogue tool provided by 
Globus did not provide GSI authentication. This 
was added by our partners in NorduGrid and then 
integrated into the Replica Catalogue server (edg-rc- 
server) and the Replica Catalogue API/CLI. However, 
it was not deployed on the testbed due to the way the 
GSI support was added and the low flexibility offered 
by LDAP in terms of configuration options. 

EDG and Globus have identified and discussed all 
issues above and thus provided a new solution known 
as Replica Location Service (RLS) In later ver- 
sions of both GDMP and edg-replica-manager, inter- 
faces to the RLS have been provided and most of the 
issues outlined above were eliminated. However, on 
the EDG testbed only the LDAP based RC was de- 
ployed up to now. RLS will be part of EDG release 
2.0. 



3.2. Mass Storage System Interaction 



Basic file transfer mechanisms like GridFTP allow 
for secure and efficient file transfer from one disk 
server to another. However, since large amounts of 
files are not only stored on disks but also on Mass 
Storage Systems (MSS) like Castor or HPSS, file repli- 
cation tools need to provide a mechanism to transfer 
files between Storage Elements, regardless of storage 
method used (disk or tape drives managed by a Mass 
Storage System). 

Originally, when we designed and developed 
GDMP, there was no direct Grid-enabled interface 
that allowed for such a file transfer. Thus, the fol- 
lowing solution was applied - primarily to applications 
in the High Energy Physics community: a large disk 
(or a disk pool) is considered as a first cache and all 
wide-area transfers are done from disk to disk. An 
additional file transfer is then required between the 
disk (pool) and the Mass Storage System. Thus, a file 
replication step includes a wide-area file transfer as 
well as a local staging to/from the Mass Storage Sys- 
tem. Such staging interfaces are provided by GDMP. 
For further details refer to [T5l |. 

In EDG release 1.4, GDMP's interface has been de- 
ployed for systems like Castor and HPSS. 

Obviously, such an additional file copy step can be 
avoided if the Mass Storage System provides a di- 
rect Grid-enabled interface supporting security (GSI), 
GridFTP, virtual organisations and space manage- 
ment. The Storage Resource Manager (SRM) inter- 
face as described in provides part of that. Several 
solutions are currently under development within sev- 
eral projects and are supposed to be included into 
EDG release 2.x. 
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3.3. GDMP - Grid Data Mirroring Pacltage 

GDMP was a pioneer effort that started initially in 
the CMS collaboration (driven by the High Energy 
Physics community) and it was originally designed to 
support file replication in High Level Trigger studies. 
Later, it became a joint project between EDG and 
PPDG. It allows for mirroring of data between Storage 
Elements through a host subscription method. The 
basic interaction is outlined in Figure El 




Figure 5: File replication/mirroring with GDMP. 

GDMP has been enhanced and improved for several 
years (see Section E!T|l . also based on lots of feedback 
from application users in the EDG testbed. Below we 
list the pros and cons of GDMP that we and our users 
experienced in the deployment of GDMP. 

3.3.1. GDMP Advantages 

• stable and scalable architecture: GDMP's archi- 
tecture has proven to be stable and scalable to 
the needs of the basic file replication sites. 

• Reliable and robust replication: the transfer 
mechanisms are reliable and robust although we 
faced a few problems with earlier implementa- 
tions of the GridFTP library. 

• Retries on error: if files are not available at the 
time of transfer, the GDMP server takes care of 
multiple retries and thus initiating the file trans- 
fer at a later point in time. 

• File check summing after file transfer: CRC 
check summing is used to compare the file con- 
tents at the beginning and the end of a file trans- 
fer. 

• Complex server side logging: the GDMP server 
takes care of logging all possible events in the 
file transfer process (including staging, subscrip- 
tion etc.). This also allows for debugging of file 
transfers in case of failures. 



• Users can control file transfer via local cata- 
logues: import, export and local file catalogues 
can be used to filter files and thus reduce the 
replication process to a specific set of files. 

• back-ends available for actions to be performed 
on replication Mass Storage System hooks , au- 
tomatic replication, post replication actions, etc. 
are provided by the GDMP server. 

• Mass Storage System interface: basically for 
Castor, HPSS or equivalent 

3.3.2. GDMP Disadvantages 

• Designed for site rather than point-to-point 
replication: GDMP was designed to handle mir- 
roring among sites and not for point-to-point 
replication. Point-to-point replication was an- 
other requirement that appeared during the us- 
age of GDMP in the EDG testbed. In order to 
respond to this request, the edg-replica- manager 
has been provided. 

• Several steps involved for replication: due to 
the fact that GDMP can mirror entire direc- 
tories with their files based on a subscription 
model, three commands need to executed in or- 
der to register files in a local catalogue, get 
them published to remote sites and then repli- 
cate them. Several users thought that this in- 
volved too many steps: this has again been ad- 
dressed in the edg-replica-manager at the cost 
that no subscription is available. 

• Difficult configuration: since GDMP has a 
rather complex set of features and offers sup- 
port for multiple VOs on one server, the con- 
figuration is rather complex ("difficult"). Some 
improvements could be made as regards the con- 
figuration and user authentication mechanism. 

• No space management provided: space manage- 
ment is beyond the scope of GDMP and is the 
responsibility of the Storage Element service (or 
SRM). 

• Error messages not always clear 

• Errors recovery requires sometimes manual in- 
tervention 

For more background on GDMP, wc refer the reader 
to [Illll. 

3.4. edg-replica-manager 

The edg-replica-manager 11] extends the replica 
management library in Globus Toolkit (TM) 2.0 and 
is a client side tool rather than a client-server system. 
It allows for replication and registration of files in a 
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Replica Catalogue and works with the LDAP based 
Globus Replica Catalog as well as the Replica Loca- 
tion Service (available in VDT 1.1.7 or higher 
In addition, it uses GDMP's staging interface to stage 
files to a Mass Storage System. The edg-replica- 
manager uses the EDG Replica Catalogue API (in 
C++). 

The edg-replica-manager uses the information ser- 
vice (Globus' MDS is used in EDG release 1.4) to find 
out storage locations on a given Storage Element. It is 
assumed that basic account management on a Storage 
Element is done via tools provided by the GDMP con- 
figuration part. Thus, the edg-replica-manager takes 
this into account and finds out where to store files 
of the particular virtual organisation a user belongs 
to. In this way, an end-user only needs to specify the 
host name of a given Storage Element and the edg- 
replica-manager then takes care of finding the exact 
source and destination as well as triggering a staging 
operation to/from a Mass Storage System. 

The basic interaction is outlined in Figure |S1 Note 
that this tool can be used to transfer files from any 
of the nodes in the EDG testbed (i.e. User Interface 
machine. Worker Node, Storage Element). A simple 
command line interface as well as a C++ interface are 
provided flU - 




Figure 6: edg-replica-manager on the EDG Testbed 

Similar to GDMP, we list the pros and cons of the 
edg-replica-manager in the following two subsections. 

3.4.1 . edg-replica-manager Advantages 

• User friendly interface: since the replica man- 
ager has a rather small amount of features and 
initial feedback from our user community has 
been gathered through GDMP, this tool pro- 
vides a user friendly interface. 

• Functional: the basic requirement of a replica- 
tion tool is satisfied including that it hides sev- 
eral details of storage locations, i.e. detailed 
storage locations are not required for storing and 



retrieving files - only the LFN is required rather 
than fuU PFNs. 

• Third party transfer available: using the fea- 
tures of GridFTP, a third-party transfer can be 
triggered from any node where the edg-replica- 
manager client is installed. 

• GSI authorisation available for Replica Cata- 
logue: due to our modifications to the LDAP 
based Replica Catalogue server, we also enabled 
GSI authentication for the edg-replica-manager. 
For RLS, GSI authentication is the default op- 
tion. 

• Easy configuration: only a few client side pa- 
rameters need to be set in order to configure the 
interaction with the Replica Catalogue and the 
Mass Storage System (i.e. GDMP's interface to 
the MSS). 

3.4.2. edg-replica-manager Disadvantages 

• Error messages not always clear 

• no roll-back; no transactions: since edg-replica- 
manager does not have a corresponding server 
(as it is the case for GDMP), no roll-back 
or transactions are implemented. In addition, 
there is no file checksumming nor centralised 
logging. In summary, the added value that one 
has with a client-server tool is not gained here. 

• No complete interface to replica catalogue 
schema: logical file information like file size or 
CRC checksumming are not supported directly. 
One needs to use the EDG C++ interface to the 
Replica Catalogue. 

For more details on the user interface of the edg- 
replica-manager, refer to the documentation at pd| . 

3.5. Comparison GDMP - 
edg-replica-manager 

A schematic comparison of the two replication tool 
is given in Table and shall assist in choosing which 
tool to use for a particular replication requirement. 

To sum up, the main difference between the "older" 
GDMP and the "younger" edg-replica-manager is that 
the former is a client-sever tool with a reach set of 
functionality whereas the later is newer client side tool 
only with more stream-lined but smaller set of func- 
tionality. 
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GDMP 



replication between SEs only 

Replicates sets of files 
provides MSS interface 
client-server 
logical file attributes: 
(size times-tamp, etc. 
... extensible) 
Subscription model 
Event notification 
CRC file size check 
Support for Objectivity/DB 
Automatic retries 
Support for multiple VOs 



edg- replica- manager 



replication between SEs, 
UI or CE to SE. 
replicates single files 
uses GDMP's MSS interface 
client side only 



Table I Comparison; GDMP versus edg-replica-manager 



4. Deployment Experience in Several 
Testbeds 



Our replication software tools have been deployed in 
various testbeds as we point out below. The software 
itself is mainly distributed as part of the European 
DataGrid software release, also referred to as EDG 
release The EDG release contains all EDG soft- 
ware, ranging from workload, data, information, fab- 
ric and mass storage management, i.e. it includes all 
our replication tools as well as other software. The 
latest version that has been deployed on the EDG 
testbed is EDG release 1.4, our main reference point 
in the discussion in this article. 



4.2. Deployment in Several Testbeds 

Our replication tools were not only used and de- 
ployed in the EDG testbed, but also in a few other 
environments as we point out below. Note the GDMP 
was also part of an early VDT release [l^ . 

GDMP was first used for High Level Trigger stud- 
ies ("production") of HEP experiments in 2000/2001 
(replication between SEs). In this environment, we 
gained our first experience and used the tool in a "pro- 
duction like" environment. 

Later, GDMP was introduced to the European 
DataGrid testbed which was originally set up in au- 
tumn 2001. This also resulted in some changes of user 
requirements: all user commands needed to be exe- 
cuted from a User Interface machine or from Worker 
Nodes of a Computing Element. This caused some 
redesign of the GDMP architecture. 

Both tools (GDMP and edg-replica-manager) are 
used in European and U.S. testbeds: 



• EDG: 

tests 



ATLAS, CMS, Ahcc and LHCb stress 



• WorldGrid: WorldGrid is the first transatlantic 
testbed where inter-operable between European 
and U.S. Grid tools has been demonstrated H, 

As regards the our replication tools: edg-replica- 
manager was used by both CMS and ATLAS 
applications to move and replicate files between 
U.S. and European sites. GDMP was used as 
part of the CMS MOP environment to replicate 
set of files produced at several sites. 

• LCG-0: deployed and inter-operable with 
WorldGrid and GLUE testbeds as has been 
shown in . 



4.1. History of Replication Tool 
Development 

Within the last three years, we gained lots of expe- 
rience with data replication tools in a Grid environ- 
ment. For a complete history of the development and 
the basic features that have been included in each re- 
lease of the software, we illustrate the replication tool 
life cycle in Table llTl Note that this table also shows 
when we stared the edg-replica-manager releases. 

Note that Globus 2.2.x does not support the replica 
catalog nor the replica management libraries. There- 
fore, edg-replica-manager has not been completely 
ported to Globus 2.2.4 but we succeeded with GDMP 
since there is only the dependency to giobus-replica- 
catalog and EDG provided a special version of that 
library. 



5. Conclusion and Future Work 

Within the last three years, we gained lots of expe- 
rience in developing and deploying replication tools 
in a Data Grid environment. Our first generation 
tools (GDMP, edg-replica-manager, API and com- 
mand line interface replica catalogue) have been suc- 
cessfully used in some "production like" environments 
as well as in several testbeds in Europe and in the U.S. 
All the tools are included in EDG release 1.4 where 
they are currently deployed on the EDG application 
testbed. 

The tools we designed and developed cover client- 
server as well as client side tools and thus provide 
a wide range of possible design choices. Whereas a 
client-server tool allows for complex functionality (in- 
cluding fault tolerance, retries, server side logging, 
server side file processing etc.), the configuration is 
comparably more complex than for simple client tools 
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GDMP 1.x 

September 2000 


First prototype ol basic SE-SE replication of Objectivity files 
Based on Globus 1.1.3 


GDMP 2.x 

October 2001 


general file replication tools (not only Objectivity files) 
uses GridFTP + Globus Replica Catalogue 
full Mass Storage Support 


GDMP 3.x 

April 2002 


split into client and server side tool 
improved server functionality /security 
support for multiple VO 


edg- replica- manager 1.x 

May 2002 


Based on globus-replica-management and globus-replica-catalog 
libraries 


edg-replica-manager 2.x 

December 2002 


Several improvements 

Replica Location Service (RLS) binding 


GDMP 3.2.x 

October 2002 


RLS + several improvements 


GDMP 4.0 

February 2003 


Globus 2.2.4 + RH 7.3 gcc 2.95.2 + gcc 3.2 



Table II History of replication tools with their versions and features 



like the edg-replica-manager. The tradeoff in such 
client-side-only solutions is that many features that 
one might want to have for fault tolerance and reli- 
ability are missing. We also gained experience with 
providing configuration options to our software tools: 
in a complex testbed it is of major importance to keep 
the configuration as simple as possible. In the current 
release, users experience some difficulties with relative 
complex configuration options. 

The experience we gained from our first generation 
tool is used in the development for the second genera- 
tion replication tools that will be provided by EDG in 
release 2.0. In particular, new services like a Replica 
Location Service -I- Replica Metadata Catalogue, an 
Optimization service etc. will be added to the basic 
functionality of the second generation tools. 
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